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A recently introduced Importance Sampling strategy based on a least squares optimiza- 
tion is applied to the Monte Carlo simulation of Libor Market Models. Such Least Squares 
Importance Sampling (LSIS) allows the automatic optimization of the sampling distribu- 
tion within a trial class by means of a quick presimulation algorithm of straightforward 
implementation. With several numerical examples we show that LSIS can be extremely 
effective in reducing the variance of Monte Carlo estimators often resulting, especially 
when combined with stratified sampling, in computational speed-ups of orders of mag- 
nitude. 



1. Introduction 

The level of sophistication of the models employed by investment firms for pricing 
derivative securities is dramatically increasing in the continuous search for a possible 
edge against competitors. As a result, most of the models used in practice is too 
complex to be treated by analytic or deterministic numerical methods, and Monte 
Carlo simulation becomes more often than ever the only feasible means of pricing 
and hedging. 

The main limitation of Monte Carlo simulations is their computational cost. In 
fact, being stochastic in nature, their outcome is always affected by a statistical 
error, that can be generally reduced to the desired level of accuracy by iterating 
the calculation for long enough time. This comes with a high computational cost as 
such statistical uncertainties, all things being equal, are inversely proportional to 
the square root of the number of statistically independent samples. Hence, in order 
to reduce the error by a factor of 10 one has to spend 100 times as much computer 
time. For this reason, to be used on a trading floor, Monte Carlo simulations often 
require to be run on large parallel computers with a high flnancial cost in terms of 
hardware, infrastructure, and software development. 

Several approaches to speed-up Monte Carlo calculations, such as Antithetic 
Variables, Control Variates, and Importance Sampling, have been proposed over 
the last few years [5]. These techniques aim at reducing the variance per Monte 
Carlo observation so that a given level of accuracy can be obtained with a smaller 
number of iterations. In general, this can be done by exploiting some information 
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known a priori on the structure of the problem at hand, hke a symmetry property 
of the Brownian paths (Antithetic Variables), the value of a closely related security 
(Control Variates), or the form of the statistical distribution of the random samples 
(Importance Sampling). Antithetic Variables and Control Variates are the most 
commonly used variance reduction techniques, mainly because of the simplicity of 
their implementation, and the fact that they can be accommodated in an existing 
Monte Carlo calculator with a small effort. However, their effectiveness varies largely 
across applications, and is sometimes rather limited [6]. 

On the other hand. Importance Sampling techniques, although potentially more 
powerful, have not been employed much in professional contexts until recently. This 
is mainly because they generally involve a bigger implementation effort. Moreover, 
when used improperly. Importance Sampling can increase the variance of the Monte 
Carlo estimators, thus making its integration in an automated environment more 
delicate. Nonetheless, the potential efficiency gains at stake are so large that the 
interest in finding efficient Importance Sampling schemes is still very high. 

The idea behind Importance Sampling is to reduce the statistical uncertainty of 
a Monte Carlo calculation by focusing on the most important sectors of the space 
from which the random samples are drawn. Such regions critically depend on both 
the random process simulated, and the structure of the security priced. For instance, 
for a deep out-of-the money Call option [T3] , the payoff sampled is zero for most of 
the iterations of a Monte Carlo simulation. Hence, simulating more samples with 
positive payoff reduces the variance. This can be done by changing the probability 
density from which the samples are drawn, and reweighing the payout function by 
the appropriate likelihood-ratio (Radon-Nikodym derivative) in order to produce 
an unbiased result of the original problem 6 . 

Most of the work in Importance Sampling methods for security pricing has 
been done in a Gaussian setting |17I2I21I7I8I18I19I1I10| such the one arising from 
the simulation of a diffusion process. In this framework. Importance Sampling is 
achieved by modifying the drift term of the simulated process in order to drive the 
Brownian paths towards the regions that are the most important for the evaluation 
of the security. For instance, for the Call option above, this can be obtained by 
increasing the drift term up to a certain optimal level |17|2j . The various approaches 
proposed in the literature, essentially differ in the way in which such change of drift 
is found, and can be roughly divided into two families depending on the strategy 
adopted. The first strategy, common to the so-called adaptive Monte Carlo methods 
|21|18|19Tl] . aims to determine the optimal drift through stochastic optimization 
techniques that typically involve an iterative algorithm. On the other hand, the 
second strategy, proposed in a remarkable paper by Glasserman, Heidelberger, and 
Shahabuddin (GHS) [7], relies on a deterministic optimization procedure that can 
be applied for a specific class of payouts. 

In a recent paper [5], we introduced the Least Squares Importance Sampling 
(LSIS) technique, as an alternative and flexible variance reduction strategy for 
Monte Carlo security pricing. This approach, originally proposed in Physics for 
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the optimization of quantum mechanical wave functions of correlated electrons |20j , 
was shown in Ref. [5] to provide an effective tool also for financial applications. 
In LSIS the determination of the optimal drift ~ or more in general of the most 
important regions of the sample space - is formulated in terms of a least squares 
minimization. This technique can be easily implemented and included in an existing 
Monte Carlo code, and simply relies on a standard least square algorithm for which 
several optimized libraries are available. 

In this paper we apply the LSIS strategy to the simulation of a multi-factor Li- 
bor Market Model, and test its effectiveness on a variety of contracts. In addition, 
to further increase the computational efficiency we combine LSIS with stratified 
sampling [llj . The resulting variance reduction strategy is shown to be quite effec- 
tive in a variety of cases, providing computational speed-ups of up to two orders of 
magnitude. 

In the following Section, we begin by discussing the simulation setting to which 
we apply the LSIS strategy. Then in Section [3] we review the main ideas behind 
Importance Sampling, and the principal approaches proposed in the financial lit- 
erature. The rationale of LSIS is discussed in Section 2] together with the essential 
implementation details, and in Section [5] we illustrate how to combine LSIS with 
stratified sampling. Sections [S] and [7] discuss the Libor Market Model setting, and 
present the numerical results obtained with LSIS in this case. Finally, we draw our 
conclusions in Section [S] 



2. The Setting 

Although the variance reduction technique we discuss in this paper can be applied to 
a variety of financial problems, in the following we will focus on pricing applications 
that involve the simulation of multi-dimensional diffusions of the form 

dX{t) = ^i{X{t),t)dt + a{X{t),t)dWt . (2.1) 

Here the process X{t) and the drift /i(X, t) are both L-dimensional real vectors, Wt 
is a A'^-dimensional standard Brownian motion, and the volatility, cr(X, t), is a, Lx N 
real matrix. We will consider the problem of estimating the value at time t = 0, of 
contracts depending on the path followed by X(t) within a certain interval [0,T]. 
This is given by the expectation value under the risk neutral probability measure, 
P [12] of the (discounted) payout functional G[X{T)] 

V^Ep[G[X{T)]] . (2.2) 

Continuous time processes of the form (|2.ip are typically simulated by sampling 
X{t) on a discrete grid of points, = <o < < • ■ • < ^Af = by means, for 
instance, of a Euler scheme |fj 

X,+i =X, + m(X„ t) AU + a{X,, t) y/M^Z.+i , (2.3) 



''The use of other discretization schemes does not alter the present discussion. 
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Fig. 1. Sampling probability density functions for a European Call option 112.61 1 with T = 1, 
T = 0.05, (T = 0.3, Xq = X = 50 as obtained with LSIS [optimizing just the drift, LSIS{/i), and 
both the drift and the volatility, LSIS(/i, cr)], and the saddle point approximation of Ref. 7_ (GHS). 
On this scale the results for LSIS(/i) and GHS are indistinguishable. The original l|2.5ll and the 
optimal 1 13. 4| I sampling densities are also shown for comparison. 



where Xi = X(ti), ISti = i^+i — i^, and Zi^\ is a A^-dimensional vector of inde- 
pendent standard normal variates. In this representation, each discrctized path for 
the vector process X(t) can be put into a one to one correspondence with a set 
of d = iV X M independent standard normal variables Z. As a result, the original 
problem of evaluating the expectation value of a functional of the realized path of 
the process X{t) can be formulated as the calculation of expectation values of the 
form 



V ^Ep{G[Z)\= JdZG{Z)P{Z) 



(2.4) 



where G{Z) = G{Zi,...,Zd) is the scalar function obtained by discretizing the 
payout functional G[X(T)] on a mesh of d sampling points, and the density is given 
by a c?-dimensional standard normal distribution 

PiZ) = N{0, Id) = (27r)-'^/2 g-zV2 ^ (2.5) 

where Z^ = Z ■ Z. For instance, for the familiar Call option in the Black-Scholes 
framework [T3J one has d^l, P{Z) = [2tx)-^I'^ cxp (-^2/2) and 

+ 



G{Z) 



-tT 



Xq exp 



a 
~2 



T 



TZ 



K 



(2.6) 
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where r is the risk-free interest rate, a is the volatihty, Xq and K are respectively 
the spot and strike price, and T the maturity of the option. 

Whenever the dimension d of the state variable Z is large (say d > 5) stan- 
dard numerical quadrature approaches become highly inefficient, and Monte Carlo 
methods are the only feasible route for estimating expectation values of the form 
(|2.4|) . To do so, one interprets Eq. p.4p as a weighted average of the payout func- 
tion G{Z) over the possible configurations Z with weights given by the probability 
density P{Z). This immediately leads to the simplest (and crudest) Monte Carlo 
estimator which is obtained by averaging the payout function over a sample of Np 
independent values of the random variable Z generated according to the probability 
density P{Z), 

1 

^ =^ ^ = ]^ E - ^(^) • (2-7) 

^ i—l 

In particular, the central limit theorem [141 ensures that, for big enough samples, 
the values of the estimator V are normally distributed around the true value, and 
converge for Np — > oo towards V namely 

i=i V^^p 

where = Ep [G(a;)^] — Ep [G(x)]^ is the variance of the estimator and can be 
similarly approximated by 

Np 

^'^]^E(G(^0-^)' • (2.9) 

P i—l 

Although Eq. (|2.8p ensures the convergence of the Monte Carlo estimator to the ex- 
pectation value (|2.4p . its practical utility depends on the magnitude of the variance, 
T? . Indeed, the square root convergence in (|2.8p . implies that the number of repli- 
cations Np that are (asymptotically) necessary to achieve a given level of accuracy 
is proportional to the variance of the estimator [^. Roughly speaking, such quantity 
is relatively small whenever the function G{Z) is approximately constant over the 
region of values of Z that is represented the most among the random samples, i.e., 
the region that contains most of the probability mass of P{Z). This is generally not 
the case for most of the pricing problems encountered in practice, and the calcula- 
tion of accurate estimates of the expectation value (12. 4p may require large sample 
sizes Np, thus becoming computationally demanding. 

''In particular, the Monte Carlo integration becomes unfeasible if the variance of the estimator 
diverges, giving rise to the so-called sign-problem instability. Although this problem is the crux of 
Monte Carlo simulations in several branches of the Physical Sciences, see, e.g., S. Sorella and L. 
Capriotti, Physical Review B 61, 2599 (2000), this issue does not usually affect financial contexts. 
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3. Importance Sampling 

The key observation underlying Importance Sampling is that the choice of extracting 
the random variable Z according to the probability density P(Z) in order to sample 
stochastically Eq. (12. 4p . although natural, is by no means the only possible one. 
Indeed, the Monte Carlo integration can be performed by sampling an arbitrary 
probability density P{Z) provided that the integral is suitably reweighed. In fact, 
using the identity 

jdZG{Z)P{Z)= JdZ^^^^PiZ), (3.1) 

an alternative estimator of the expectation value (12. 4p is readily found as 

1 

= Z,-^PiZ), (3.2) 

with the weight function given by W{Z) = P{Z)/P{Z). The variance of the new 
Monte Carlo estimator reads 

= jdZ {W{Z) G{Z) - Vf P{Z) (3.3) 

and critically depends on the choice of the sampling probability density P{Z). For 
non- negative functions G{Z), the optimal choice of P{Z) is the one for which S 
vanishes, namely: 

Popt(^)--^G(Z)P(Z) . (3.4) 

In fact, the Monte Carlo estimator corresponding to such optimal sampling density 
reads 

^^WT. W{Z.)G{Z,) - ^ E ^ ' (3-5) 

leading to a constant value V on each Monte Carlo replication, and resulting there- 
fore in zero variance 0. Unfortunately, such a choice is not really viable as the 
normalization constant, V, is the expectation value (j2.4p we want to calculate in 
the first place. Nevertheless, this observation provides the useful indication that the 
sampling density P{Z), modulus a normalization, should be as close as possible to 
the product of the payout G{Z) and the original multi-variate Gaussian distribution 

In this respect. Importance Sampling strategies generally choose a family of 
trial probability densities, Pe{Z) - depending on a set of Ng real parameters 

'^It is possible to show 16 that, when G{Z) does not have a definite sign, the optimal sampling 
density has the similar form Popt = \G{Z)\P{Z) /V , although in this case the resulting variance is 
not zero. 
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6 — (61,62, .■■ , 6isig ) - and aim at determining the one that minimize the vari- 
ance of the estimator p.3p within the class. In particular, Importance Sampling 
methods in security pricing generally try to guide the sampled paths towards the 
most important regions of the configuration space (i.e., where the contribution of 
the integrand is the largest), by means of a change of the drift terms of the process 
(|2.ip or (|2.3p . The corresponding trial probability density reads 



where /2 is a c?-dimensional vector, and the weight function, as also expected from 
the Girsanov theorem [15], is 



A variety of approaches for the determination of the drift vector /2 minimizing 
the variance of the estimator (|3.3p has been recently proposed in the literature 



strategy adopted. 

The first strategy, common to the so-called adaptive Monte Carlo methods, is 
based on a stochastic minimization of the variance. Such minimization differs in 
details in the various methods but always involves an iterative procedure, to be 
performed in a preliminary Monte Carlo simulation. 

In particular, Su and Fu |18ll9j . building upon previous work by Vazquez- Abad 
and Dufresne [21], used a gradient-based stochastic approximation, dubbed infinites- 
imal perturbation analysis, in order to estimate the optimal uniform shift of the drift 
for the diffusion (|2.3p . minimizing the variance of the estimator (|3.3p . In the no- 
tation of this Section, this translates in working with a trial density of the form 
(|3.6p where the drift vector jl has components all equal to a single optimization 
parameter. The improvement of this method with respect to the one of Ref. [211, is 
that the minimization is carried out under the original probability measure, while 
in the latter the minimization was formulated under the trial probability measure. 
As a result, the stochastic minimization applies also for non differentiable payout, 
thus making the approach more general. The application of this technique to par- 
tial average Asian options in a Black-Scholes market, and to Caplets under the 
Cox-IngersoU-Ross model provides significative variance reductions [18|19j . 

Along similar ideas, Arouna fT] has recently proposed a different stochastic opti- 
mization method for the determination of the optimal sampling density (|3.6p . Here, 
in contrast to the previous approach, all the components of the drift vector are 
independently optimized. The method relies on a truncated version of the Robbins- 
Monro algorithm that is shown to converge asymptotically to the optimal drift, and 
to provide an effective variance reduction in a variety of cases. 

On the other hand, the alternative strategy for the optimization of the trial 
density (|3.6p . proposed by Glasserman, Heidelberger, and Shahabuddin T], relies 
on a saddle point approximation to minimize the variance of the estimator p.3p , or 



(3.6) 



M^^(Z) =exp [-m-^ + AV2] 



(3.7) 




. These can be roughly classified into two families depending on the 
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equivalently of its second moment (in the original measure) 
m2(A) - [dZ Wp,{Z)G{Zf P{Z) . 



(3.8) 



In fact, if the payout function G{Z) is positive definite, by defining F{Z) — \ogG{Z) 
one can approximate Eq. p.8p with the zero-order saddle point expansion 



where C is a constant. As a result, within this approximation, the problem of 
determining the optimal change of drift boils down to finding the vector such 
that 



is minimum. It is easy to show that this is obtained by choosing jl* = Z* where Z* 
is the point that solves the optimization problem 



or equivalently, for which the payout times the original density, G{Z)P{Z), is 
maximum, i.e., Z* corresponds to the maximum of the optimal sampling density, 
Eq. (|3.4p . The simplest interpretation of the saddle point approach is therefore that 
it approximates the zero variance density by means of a normal density with the 
same mode and variance. 

This approach has been recently generalized to the continuous time in the Black- 
Scholes framework in a recent work by Guasoni and Robertson [10] . This formulation 
allows one to express the problem of the determination of the optimal drift in terms 
of a one-dimensional variational problem, and the solution of a Euler Lagrange 
equation. 

The saddle point approach can be expected to be particularly effective in re- 
ducing the variance of the Monte Carlo estimator whenever the log payout function 
F{Z) is close to be linear in the portion of the configuration space where most of 
the probability mass of P{Z) lays. However, whenever the optimal sampling prob- 
ability (|3.4p cannot be accurately represented by a single Gaussian with the same 
mode and variance, the saddle point approximation is less beneficial. In particular, 
this approach turns out to be less effective whenever the structure of the payout 
function G{Z) is such that the optimal sampling density (|3.4p has a width which is 
very different from the one of the original density, or is multi-modal. 

In the following Section we describe an alternative least squares strategy that is 
straightforward to implement and flexible enough to be applied in a generic Monte 
Carlo setting. Indeed, the Least Squares Importance Sampling (LSIS) is not limited 
to the determination of the optimal change of drift in a Gaussian model. Instead, 
it can be applied to any Monte Carlo simulation provided that a reasonable guess 




max 



(3.9) 



max 



{F{Z)-Z^/2) , 



(3.10) 
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Table 1. Variance reductions 14.61 1 obtained with different Importance Sampling strategies. Com- 
parison between LSIS, the adaptive Robbins-Monro (RM) algorithm (as quoted in Ref. jl]), and 
the saddle point approach of Ref. 7j (GSH): price of a European Call option on a lognormal asset 
12.61 1 for different values of the volatility a, and of the strike price K. The parameters used are 
r = 0.05, Xo = 50, T = 1.0, and the number of simulated paths is 1,000,000 for Crude MC, 
LSIS and GHS, 50,000 for RM. Results for LSIS obtained by optimizing the drift only [LSIS(/i)], 
and both the drift and the volatility [LSIS{/^, <t)] are reported. The uncertainties are reported in 
parentheses. 



a 


K 


LSIS(/i) 


LSIS(/2, a) 


RM 


GHS 


0.1 


30 


104(1) 


1700(100) 


112(4) 


100(1) 




50 


7.8(1) 


15(1) 


7.8(4) 


7.8(1) 




60 


33.5(5) 


84(5) 


31(2) 


33.5(5) 


0.3 


30 


16.4(1) 


51(1) 


16.8(4) 


14.8(2) 




50 


9.9(5) 


27(1) 


11(2) 


9.9(1) 




60 


15.6(1) 


35(1) 


15.2(4) 


14.2(1) 


Table 2. 


Same as Table [Tl for a European Put 


option. 


a 


K 


LSIS(/i) 


LSIS(/i, a) 


RM 


GHS 


0.1 


40 


435(6) 


571(9) 


350(24) 


435(6) 




50 


8.8(1) 


25(2) 


9.6(4) 


9.1(1) 




60 


5.9(1) 


17(1) 


6.3(4) 


5.9(1) 


0.3 


30 


41(1) 


69(2) 


38(4) 


40.8(5) 




50 


5.8(1) 


16.5(5) 


6.2(4) 


5.8(1) 




60 


4.9(1) 


13.9(2) 


4.8(4) 


4.4(1) 



of the optimal sampling density is available. For this reason, in the next Section we 
will momentarily leave the Gaussian framework, and we will describe the rationale 
of LSIS in a more general setting. 

4. Least Squares Importance Sampling 

A practical approach to the search of an effective Importance Sampling density 
can be formulated in terms of a non-linear optimization problem. To this purpose, 
let us consider the family of trial probability densities, Pe{Z). The variance of the 
estimator corresponding to Pe{Z), Eq. p.3p . can be written in terms of the original 
probability density P{Z) as 

±1 = Ep [Ws{Z)G^Z)] - Ep [G{Z)f , (4.1) 

with W0{Z) = P{Z) / Po{Z). Hence, the optimal Importance Sampling density 
within the family Pe{Z) is the one for which the latter quantity, or equivalently 
the second moment p.8p or 

Ep[We{Z)G^{Z)] , (4.2) 
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is minimum. The crucial observation is that the Monte Carlo estimator of this 
quantity, 

K 2 

^ T77 E (WeiZ.y^'GiZ,)) ^ P{Z) , (4.3) 

can be interpreted as a non-linear least squares fit of a set of N'p data points {xi, yi) 
with a function y = fe{x) parameterized by 9, with the correspondence yi 0, 
Xi ^ Zi, and fe{x) We[ZY/'^G{Z). The latter is a standard problem of statistical 
analysis that can be tackled with a variety of robust and easily accessible numerical 
algorithms, as the so-called Levenberg-Marquardt method [IB] . 

Alternatively, to improve the numerical stability of the least-squares procedure, 
it is convenient in some situations to minimize, instead of (j4.2p . the pseudo- variance 



S2{e) = Ep 



We{ZY''^G{Z)-VT 



K 2 
=^1^E(^«(^0'/'G(Z0-^t) (4.4) 

where the constant is a guess of the option value. Indeed, the minimization of 
(|4.4p is equivalent to the one of the real variance of the estimator (|4.H) as 

^2(0) = S2 + {Ep \G{Z)\ - Vrf . (4.5) 

The algorithm for the determination of the optimal sampling density within a 
certain trial family can be therefore summarized as it follows: 

(1) Generate a suitable number N'^ of replications of the state variables Z according 
to the original probability density P{Z); 

(2) Choose a trial probability density P0{Z), and an initial value of the vector of 
parameters 9; 

(3) Set x^ Z„ fe{x) W0{Z)^^^G{Z) and yi ^ (resp. yi Vp) and call a 
least squares fitter, say LSQ [x,y, fg{X),9] , providing the optimal 9 = 9* hy 
minimizing the second moment of the estimator m2(0), Eq. (|4.3p [resp. 5*2(0), 
Eq. g31)]. 



Once the optimal parameters 9* have been determined through the least squares 
algorithm, one can perform an ordinary Monte Carlo simulation by sampling 
the probability density Pei,{Z), and calculating expectation values according to 
Eq. (Ha. 

What makes LSIS a practical strategy is that just a relatively small number 
of replications <C Np is usually required to determine the optimal parameters 
9*. This is due to the fact that the configurations over which the optimization 
is performed are fixed. As a result of this form of correlated sampling [20] . the 
difference in the 7712(0) 's for two sets of values of the parameters being optimized is 
much more accurately determined than the values of the 7712 (0)'s themselves. This 
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rather surprising feature is rooted in the fact that the minimization of Eq. (|4.3p as 
a means to optimize the trial density, Pe{Z), can be justified in terms of a genuine 
maximum likelihood criteria [1], and it is therefore independent on how accurately 
m2{9) approximates the quantity (|4.2p . As a result, the overhead associated with 
the optimization of the trial density is generally fairly limited, thus making LSIS a 
practical approach for variance reduction. 

In a companion paper [5] we have demonstrated the effectiveness of LSIS by 
applying it to a variety of test cases. In particular, we have shown that LSIS provides 
variance reductions comparable or superior to those of the Importance Sampling 
methods most recently proposed in the financial literature |7|18|19|1] . As a simple 
example, for instance, below we briefly review the results obtained for standard Call 
and Put options in a Black-Scholes setting. In this case the payout function reads as 
in Eq. \2.Q\ (for the call), and the sampling density P{Z) is a univariate standard 
normal density. 

As discussed above. Importance Sampling techniques seek a sampling probability 
density Pe{Z) as close as possible to the optimal samphng density, Eq. (|3.4p (see 
Figure[T|). The simplest choice for Pe{Z), in this setting, is a Gaussian density of the 
form p.6p (with d — 1), so that the only parameter 9 to optimize is the drift /i. We 
found that the least squares fitter was able to determine successfully the optimal /i 
with as little as ~ 50 Monte Carlo replications. 

In Tables [1] and [5] we compare the results obtained with LSIS with the ones 
obtained by means of the Robbins Monro (RM) adaptive Monte Carlo (as quoted 
in Ref. 1 ), and the saddle point approach of CHS 2|. Here, as an indicator of the 
efficiency gains introduced by the different strategies of Importance Sampling, we 
have defined the variance ratio as 



where the numerator and denominator are the statistical errors (for the same num- 
ber of Monte Carlo paths) of the Crude and the Importance Sampling estimators, 
respectively. 

We found that the different methods produce a significative and comparable vari- 
ance reduction. Intuitively, the change of drift is more effective for low volatility, and 
deep in and out of the money options (see also the discussion in the Introduction). 
In this case, the LSIS and CHS optimized trial densities Pfi{Z) are very similar as 
shown Fig.[T] This could be expected as, in this case, the optimal Importance Sam- 
pling density (|3.4p can be effectively approximated by a Gaussian with the same 
mode and variance, so that the GHS approach produces accurate results. 

However, the LSIS method is not limited to Importance Sampling strategies 
based on a pure change of drift, and one can easily introduce additional optimization 
parameters in the trial density. For instance, in this example it makes sense to 
introduce the sampling volatility, a, 




(4.6) 



Pf^MZ) = (27ra2)-V2e-(^-M)V25^ . 



(4.7) 
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As illustrated in Fig. [2 by adjusting both p. and a, one obtains a trial density closer 
to the optimal one. This corresponds to an additional variance reduction up to over 
one order of magnitude, as shown in Tables [1] and [2l 

5. Stratified Sampling 

In a diffusive setting, LSIS can be naturally combined with stratified sampling [11] 
in order to achieve further variance reductions. In this Section we illustrate how. 
We begin by reviewing the basic ideas underlying Stratification following Refs. [7l6] . 

Stratification is a technique that allows one to draw samples from a specified 
distribution in a more regular pattern thus reducing the variance. This is achieved 
by ensuring that the fraction of samples which falls in different subsets, or strata, 
of the domain of the random variable matches the theoretical probability of each 
subset. For example, in order to perform a stratified sampling of a single standard 
normal variable one can divide the real axis into M strata, such that the probability 
of the random variable to fall in any of them is 1/M. This can be done easily by 
first dividing the unit interval (0, 1) into M segments of length 1/M, and sampling 
uniformly from each of them. Then, each of the sampled uniform is mapped into 
a standard Gaussian by means of the inverse cumulative normal distribution. The 
resulting set of M variates will contain exactly one variable for each of the M strata 
of the real axis, and constitute therefore a stratified sample of the standard normal 
distribution. This simple algorithm can be therefore summarized as it follows: 

(1) Draw M random variables, say u^, . . . , u*^, uniformly distributed in (0, 1). 

(2) Define a new set of M random variables 

= ! , 

M M ' 

with i = 1, . . . , M, i.e., such that the i-th variable is uniformly distributed in 
the interval (i - 1/M, i/M). 

(3) Set 

where $ is the standard normal cumulative density function. The variables 
(X*^^', . . . , X*^*^-') constitute the sample of the standard normal distribution, 
stratified into M strata. 

Although this procedure can be generalized to multi-dimensional normal vari- 
ates, it becomes unpractical in high-dimension (d > 5) for the same reason for 
which estimating the integral (|2.4p by numerical quadrature becomes exponentially 
inefficient: if each dimension is divided into M strata, their total number scales as 
A/"*. As a result, generating just one point on each stratum requires a sample size at 
least this large, thus becoming prohibitive for the values of M > 10 that generally 
make Stratification effective in reducing the variance. 
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A feasible way of applying Stratification to the sampling of a multi-variate nor- 
mal distribution is to stratify only a specific one-dimensional projection of the ran- 
dom variable Z ^ N{0,ld). This is straightforward because, the projection of Z 
along a direction in represented by a unit vector ^, ^ • 2', is a standard normal 
variable that can be stratified using the one-dimensional algorithm described above. 
In addition, it is also easy to sample the vector Z conditional to a specific value of 
its projection ^ ■ Z, as the conditional distribution {Z\$, ■ Z = x) is itself normal and 
given by N{x£^, Id — £,£,*)■ The resulting algorithm leading to the stratification of Z 
along the direction ^ can be therefore summarized as it follows: 

(f) Generate a stratified sample of X'^^ . . . , X*^*^^ of the standard normal dis- 
tribution as described above. Interpret X*^'' as the the i-th value of the one- 
dimensional projection ■ Z, oi Z ^ N{Q,Id). 

(2) Draw AI independent d-dimcnsional Gaussian variates 

from N{0,ld)- 

(3) Set 

z(') ^ + {Id - . 

The resulting set (Z^^) , . . . , Z^*^' ) constitutes a sample from iV(0, Id) stratified along 
the direction ^ into M strata. 

Loosely speaking, the Stratification of a one-dimensional projection of a multi- 
dimensional normal variate has nearly the same effect of replacing the Monte Carlo 
integration with a numerical quadrature along the stratified direction ^, while still 
using Monte Carlo for the remaining ones. Clearly, the choice of the direction ^ is 
critical for the Stratification to be effective in terms of variance reduction. This is 
likely to be the case if the output is strongly correlated to the value of the projection 

As anticipated, the simplest possible strategy for Importance Sampling in a 
Gaussian framework, is to look for an optimal change of drift, i.e. to adopt the 
simple shifted Gaussian of Eq. (|3.6[) as trial probability density. In this setting, as 
suggested by Glasserman and collaborators [7] , a natural choice for the direction of 
stratification is the optimal drift vector itself. This can be rigorously justified if the 
payout is a function of a linear combination of the Zj's. However, in Refs. |7|8] and 
[5] it has been shown that this choice works in practice more in general, turning 
out to be highly effective in a variety of cases. In this paper, we also follow this 
strategy, and demonstrate its effectiveness for a variety of examples in the context 
of the Libor Market Model. 

6. The Libor Market Model Setting 

In the remainder of this paper we will apply the LSIS strategy, reviewed above, 
to the Libor Market Model of Brace, Gatarek and Musiela [3] for the arbitrage- 
free evolution of the forward Libor rates. In order to introduce this framework, we 
indicate with T^, i = 1, . . . , M -I- I, a set of M -I- 1 bond maturities, with spacings 
h = Ti+i — Ti, assumed constant for simplicity. The Libor rate as seen at time t for 
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the interval [Ti, Ti^i), Li{t), evolves according to the following stochastic differential 
equation 

= ^^,{L{t))dt + a,{tfdWt, 0<t<T,, z = l,...,M, (6.1) 

where is a A'^-dimensional standard Brownian motion, L(t) is the M-dimensional 
vector of Libor rates, and (7i {t) the A^-dimensional vector of volatilities, both at time 
t. Here the drift term, as imposed by the arbitrage free conditions, reads 

where rjit) denotes the index of the bond maturity immediately following time i, 
with T^(t)-i < t < Tjj{t). 

Equation (|6.ip can be simulated by applying a Euler discretization to the log- 
arithms of the forward rates, and by dividing each interval [Ti,Ti^i) into Ue steps 
of equal width, = h/ue. This gives 

Li{n + 1) 

for i = ri{nhe), M, and Li{n + 1) = Li{n) if i < ri{nh). Here Z is a A^- 
dimensional vector of independent standard normal variables. Under the discretized 
model (j6.3p . the problem of evaluating the price of a contract written on a set of 
Libor rates is then formulated in the general form (|2.4[) . and LSIS can be straight- 
forwardly applied. 

In the following we will present results using a trial probability density involving 
displaced Gaussian multi-variate densities of the form (|3.6|) . This choice requires in 
principle the optimization of a number of parameters - the components of the drift 
vector fl - proportional to the number of Gaussian univariate Zi necessary for the 
propagation of the Libor rates in the desired time horizon, namely d = M x N x rig. 
As the number of time steps or the number of factors of the simulation increase, the 
complexity of the optimization problem increases as well. Nevertheless, as suggested 
in Ref. [8] and verified in the companion paper [5] for a variety of examples, one can 
significantly reduce the computation time associated with the optimization stage by 
approximating the drift vector with a continuous function parameterized by a small 
number of parameters. These are in turn tuned by the least square algorithm in order 
to determine an approximate optimal drift vector. We have found that a particularly 
effective realization of this approach is to approximate the drift vector by a piecewise 
linear function, parameterized by its values where it changes slope (the so-called 
knot points). In particular, in the simulation of the LMM we have found that by 
using a very limited number of knot points for each random factor (say for 1 to 5) 
one is able to achieve very effective variance reductions through LSIS and LSIS plus 
Stratification. Hence the simulation of the LMM required the optimization of a very 
small number of parameters (form 3 to 15, for N = 3) thus making the overhead 
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Table 3. Variance reductions 1 14.61 1 obtained with LSIS and LSIS plus Stratification (LSIS+) for 
Caplets, Eq. 1 17.41 1. in a three factor Libor Market Model, for different maturities T^, and strike 
prices K. Nf; is the number of knots per factor (see text). The number of simulated paths is 
200,000. The uncertainties on the variance reductions are reported in parentheses. 



Tm (years) 


K 


Nk 


LSIS 


LSIS+ 


1.0 


0.04 


1 


11.4(1) 


1349(1) 


1.0 


0.055 


1 


13.3(2) 


2300(2) 


1.0 


0.07 


1 


20.2(1) 


4126(4) 


2.5 


0.04 


1 


14.0(1) 


1189(1) 


2.5 


0.055 


1 


15.5(1) 


897(1) 


2.5 


0.07 


1 


18.1(1) 


1831(1) 


5.0 


0.040 


1 


12.7(1) 


235.2(5) 


5.0 


0.060 


1 


12.5(1) 


237.0(5) 


5.0 


0.080 


1 


14.5(1) 


193.3(4) 


7.0 


0.04 


1 


7.9(3) 


40.0(1) 


7.0 


0.055 


1 


8.5(4) 


43.7(1) 


7.0 


0.07 


1 


8.5(4) 


40(1) 



associated with the presimulation stage rather hmited. More precisely, we found that 
a few hundred Monte Carlo configurations and 10-20 iterations of the least squares 
fitter, were typically enough to determine the optimal drift vector. In addition, such 
vector generally changes continuously with the simulation parameters. As a result, 
an even faster convergence in the iterative procedure can be obtained by starting 
the pre-simulation from a drift vector optimized for a case with a similar set of 
parameters. 

7. Numerical Results 

The numerical results we present in this Section are based on the evolution of (j6.3p 
in a three-factor {N = 3) model with h — 1/4 {a quarter of a year), and rtg = 3. 
Following Ref. [9], to keep things simple we take the volatilities to be functions of 
time to maturity 

a,{t) = CT,;_,,(t) + i(0) , (7.1) 

with 

ai{0) = aoil + aj){l+Pt) , (7.2) 

j = 1, . . . , 3, a = 0.1 and (3 = 0.01, and ao = 0.2. As initial Libor curve we take 
instead 



with ^0 = 5%. 



(7.3) 
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As a first example we consider a Caplet for the interval [Tm, Tm+i) struck at K, 



Table 13] displays the estimated variance ratios obtained with LSIS, and the combi- 
nation of LSIS and Stratification (LSIS-I-) introduced in Section [5] for a variety of 
maturities, and strike prices that range from in the money to out of the money. Here 
the results are all obtained using (|3.6p as trial probability density, and by param- 
eterizing the change of drift of each factor with a single parameter or knot point, 
corresponding to a rigid shift. We have verified that increasing the number of knots 
does not provide further sizable benefits in this case. As shown in Table [Sj LSIS 
provides remarkable variance reductions, corresponding to a saving of roughly one 
order of magnitude in computational time, consistently across maturities. For fixed 
maturity, as expected, LSIS is more effective for out of the money strikes since in 
these cases the fraction of paths expiring worthless is more significant. These paths 
clearly provide little information, and tend to increase the variance of the sample. 
Changing the drift increases the fraction of paths which end up in the money thus 
making the sample more homogeneous. Conversely, as the maturity increases, the 
variance reduction provided by LSIS decreases as the outturn distributions of the 
Libor rates become more delocalized, and the change of drift strategy becomes less 
effective. 

The combination of LSIS and Stratification provides for Caplets a tremendous 
variance reduction of up to two orders of magnitude (see Table |3l). However, the 
effectiveness of LSIS-I- decreases sharply with maturity. Nevertheless, for the exam- 
ples considered, it still gives around a factor of 40 in variance reduction for a 7 year 
maturity, thus resulting in extensive savings in computational time also for fairly 
long expiries. 

Although important instruments for calibration, Caplets constitute an easy test 
ground for LSIS and LSIS-f as they are mostly sensitive to the single Libor rate 
determining the final payment. A more articulated example on which to assess the 
efficacy of LSIS are interest rate Caps. We consider contracts with first payment T„ 
and last payment Tm, and tenor h 



The results obtained for a variety of maturities and strike prices are shown in 
Table ID In this case we have verified that Nk = 3 knot points provided the bulk 
of the variance reduction for the trial density function (|3.6p . The efficiency gains 
produced by LSIS, although slightly smaller than in the case of a single Caplet, are 
consistently around 10 — 15 for all the maturities considered. As expected, LSIS-I- is 
not able to provide the massive variance reductions observed for Caps. Nonetheless, 
for the cases considered, it provides a further reduction of the variance with respect 
to LSIS of a sizable factor ranging from 2 to 4. 




(7.4) 



M 




(7.5) 



l—n 
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Table 4. Variance reductions obtained with LSIS and LSIS plus Stratification (LSIS+) for Caps 
Eq. 17.511 in a three factor Libor Market Model, for T„ = 0.25 (years), different final maturities 
Tjf , and strike prices K. is the number of knots per factor (see text). The number of simulated 
paths is 200,000. The uncertainties on the variance reductions are reported in parentheses. 



Tm (years) 


K 


Nk 


LSIS 


LSIS+ 


1.0 


0.04 


3 


10.6(5) 


37.2(8) 


1.0 


0.055 


3 


9.7(3) 


19.8(5) 


1.0 


0.07 


3 


13.6(5) 


21.6(6) 


2.5 


0.04 


3 


16.2(5) 


40.3(7) 


2.5 


0.055 


3 


12.0(4) 


33.8(7) 


2.5 


0.07 


3 


15.7(5) 


47.3(8) 


5.0 


0.04 


3 


14.9(5) 


43.7(9) 


5.0 


0.055 


3 


14.5(6) 


46.7(9) 


5.0 


0.07 


3 


15.6(6) 


55(1) 


7.0 


0.04 


3 


13.0(6) 


42.6(8) 


7.0 


0.055 


3 


12.2(5) 


45.1(9) 


7.0 


0.07 


3 


12.6(4) 


55(1) 



Table 5. Variance reduction obtained with LSIS and LSIS plus Stratification (LSIS+) for Swaptions 
Eq. 17.611 in a three factor Libor Market Model. T,i is the option expiry and Tjv/^i is the final 
payment date of the underlying swap. K is the strike price. A^^ is the number of knots per factor 
(see text). The number of simulated paths is 200,000. The uncertainties on the variance reductions 
are reported in parentheses. 



Tn (years) 


Tm+i 


K 


Nk 


LSIS 


LSIS+ 


0.5 


1.5 


0.04 


3 


6.8(3) 


35.2(8) 


0.5 


1.5 


0.055 


3 


10.5(4) 


143(2) 


0.5 


1.5 


0.07 


3 


21.2(6) 


209(2) 


0.5 


2.5 


0.04 


3 


7.0(3) 


41.9(9) 


0.5 


2.5 


0.055 


3 


9.8(3) 


149(2) 


0.5 


2.5 


0.07 


3 


18.6(5) 


427(2) 


0.5 


5.5 


0.04 


3 


6.8(3) 


50(1) 


0.5 


5.5 


0.055 


3 


8.5(3) 


106(1) 


0.5 


5.5 


0.07 


3 


12.0(4) 


148(1) 


1.0 


6.0 


0.04 


3 


8.0(4) 


144(2) 


1.0 


6.0 


0.055 


3 


8.6(3) 


165(2) 


1.0 


6.0 


0.07 


3 


12.7(4) 


654(3) 


2.0 


7.0 


0.04 


3 


9.2(3) 


70(1) 


2.0 


7.0 


0.055 


3 


9.7(3) 


139(1) 


2.0 


7.0 


0.09 


3 


13.9(4) 


140(1) 


5.0 


10.0 


0.04 


5 


7.3(4) 


76(1) 


5.0 


10.0 


0.055 


5 


7.4(3) 


72(2) 


5.0 


10.0 


0.09 


5 


7.5(4) 


197(2) 
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LSIS and LSIS+ result in remarkable computational savings also for Swaptions. 
Here we have considered contracts with expiry r„ to enter in a swap with payments 
dates T„+i, . . . , Tm+i, with the holder of the option paying a fixed rate K 

M+l 

V{T^)^ J2 B{Tn,T,)hiSn{Tn) - K)+ , (7.6) 

i—n-\-l 

where B{T.n^ Ti) is the price at time of a bond maturing at time 



l=n 

and the swap rate reads 



Sn{Tn) = ■ (7.8) 

The results are shown in [5] and indicate that LSIS provides variance reductions in 
the range 7 20 and LSIS+ further increases the computational efficiency by up to 
one order of magnitude. 

As a final example - illustrating for a simple case the flexibility of LSIS - we 
consider the combination of a long Caplet and Flooret in a Straddle contract 



\i=0 



In this case, the optimal sampling density (see Secl3|), proportional to the prod- 
uct of the payout and the Gaussian sampling density (|2.5p . has two well separated 
maxima because of the modulus in Eq. (|7.9p . As a result, a single mode trial proba- 
bility density p.6p provides limited variance reductions, especially for strikes at the 
money, where the relative importance of the two maxima is similar (see Tab. [5]). 
However, the LSIS is not limited to a Gaussian trial density and one can use this 
flexibility to utilize a more accurate guess of the optimal sampling density. In par- 
ticular, a better ansatz for the optimal density is represented by a bi-modal trial 
density of the form 

P{Z) = (27r)-''/2 \wa e-(^-^»)'/2 + Wb e-'^^-f"'^^ , (7.10) 

where Wa + Wh ^ I that can be optimized over /i^, fJ-b, and Wa- The simulation 
of a density of this form is straightforward as it simply implies choosing one of 
the two Gaussian components in (|7.10p on each Monte Carlo step, and sample 
a configuration Zi according to it. This can be done by extracting an auxiliary 
uniform random number ^ G [0, 1], and sampling Zi according to the first Gaussian 
component if ^ < Wa, and according to the second otherwise. As shown in Table 
[SI using this trial density, LSIS improves significantly the computational efficiency 
also for Straddle contracts. 
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Table 6. Variance reduction obtained with LSIS for a Straddle Eq. I I7.9I I in a three factor Libor 
Market Model, for different maturities T^, and strike prices K. Ni^ is the number of knots per 
factor (see text). Results are shown using Eq. ||3^ [LSIS] and Eq. iTJOl [LSIS (MM)] as trial 
densities. The number of simulated paths is 200,000. The uncertainties on the variance reductions 
are reported in parentheses. 



T„ (years) 


K 


Nk 


LSIS 


LSIS (IVIIVI) 


1.0 


0.04 


1 


2.8(1) 


5.8(1) 


1.0 


0.05 


1 


1.3(1) 


5.3(1) 


1.0 


0.06 


1 


1.0(1) 


3.9(1) 


1.0 


0.07 


1 


1.1(1) 


3.4(1) 


5.0 


0.04 


1 


2.8(1) 


8.7(1) 


5.0 


0.05 


1 


1.9(1) 


6.5(1) 


5.0 


0.06 


1 


1.5(1) 


4.9(1) 


5.0 


0.07 


1 


1.2(1) 


4.0(1) 



8. Conclusions 

In this paper we have described the application of the recently introduced Least 
Squares Importance Sampling (LSIS) [5] to the simulation of Libor Market JVIodels. 
Such variance reduction technique allows one to automatically optimize the sam- 
pling density within a chosen trial class by means of a presimulation algorithm of 
straightforward implementation. 

What makes the approach practical in a financial context is that the overhead 
associated with the least squares optimization of the trial density is generally rather 
limited especially after reducing the dimensionality of the problem by means of a 
careful parametrization. 

With several numerical examples we have shown that LSIS can be extremely 
effective in reducing the variance per sample of the simulation, thus resulting in re- 
markable speed-ups. JVIoreover, when used with Gaussian trial probability densities, 
LSIS can be naturally combined with Stratification thus providing further efficiency 
gains that can result in computational savings of orders of magnitude. 

The efficacy of any Importance Sampling strategy is much dependent on how 
effectively the trial density function is able to reweigh the different regions of the 
sampled space in order to reduce the statistical fluctuations of the accumulated 
observables. These regions depends on both the model simulated, and the structure 
of the payout being priced. In this respect LSIS, when compared with previously 
methods, offers additional potential leeway as it is not limited to Gaussian trial 
densities. This becomes important when the structure of the optimal density is 
particularly complex e.g., with multi- modal features, or complicated correlation 
structures. In this paper we have illustrated this point with a simple multi-modal 
example. Further work is currently in progress in order to introduce more flexible 
probability distributions as trial densities. 
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